We use Python for this:
Use what your colleagues (tend to) use
To analyse and visualise experimental data
Tabular (comma-separated) data
We can do this with a little programming
Before we begin…
cd ~/Desktop
mkdir python-novice-inflammation
cd python-novice-inflammationLIVE DEMO
Before we begin…
cp 2017-03-23-standrews/lessons/python-01/data/python-novice-inflammation-data.zip ./
cp 2017-03-23-standrews/lessons/python-01/data/python-novice-inflammation-code.zip ./
unzip python-novice-inflammation-data.zip
unzip python-novice-inflammation-code.zip(you can download files via Etherpad)
(http://pad.software-carpentry.org/2017-03-23-standrews)
LIVE DEMO
JupyterAt the command-line, start Jupyter notebook:
jupyter notebookJupyter landing page
variables)
Jupyter documents are comprised of cellsJupyter cell can have one of several typesMarkdownMarkdown allows us to enter formatted text.Shift + EnterShift + Enter
name, containing "Samia"print() function shows the contents of a variable
weight_kg = 55
print(weight_kg)
2.2 * weight_kg
print("weight in pounds", 2.2 * weight_kg)
weight_kg = 57.5
print("weight in kilograms is now:", weight_kg)
weight_lb = 2.2 * weight_kg
print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
weight_kg = 100
print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)What are the values in mass and age after the following code is executed?
mass = 47.5
age = 122
mass = mass * 2.0
age = age - 20mass == 47.5, age == 122mass == 95.0, age == 102mass == 47.5, age == 102mass == 95.0, age == 122What does the following code print out?
first, second = 'Grace', 'Hopper'
third, fourth = second, first
print(third, fourth)Hopper GraceGrace Hopper"Grace Hopper""Hopper Grace"Jupyter notebook or iPython terminal…%whos will show you all defined variables
data/inflammation-01.csv$ head data/inflammation-01.csv
0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1
0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1
0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1
0,0,1,2,2,4,2,1,6,4,7,6,6,9,9,15,4,16,18,12,12,5,18,9,5,3,10,3,12,7,8,4,7,3,5,4,4,3,2,1
0,0,2,2,4,2,2,5,5,8,6,5,11,9,4,13,5,12,10,6,9,17,15,8,9,3,13,7,8,2,8,8,4,2,3,5,4,1,1,1
0,0,1,2,3,1,2,3,5,3,7,8,8,5,10,9,15,11,18,19,20,8,5,13,15,10,6,10,6,7,4,9,3,5,2,5,3,2,2,1
0,0,0,3,1,5,6,5,5,8,2,4,11,12,10,11,9,10,17,11,6,16,12,6,8,14,6,13,10,11,4,6,4,7,6,3,2,1,0,0
0,1,1,2,1,3,5,3,5,8,6,8,12,5,13,6,13,8,16,8,18,15,16,14,12,7,3,8,9,11,2,5,4,5,1,4,1,2,0,0numpy libraryPython librariesPython contains many powerful, general toolsimportimport numpy
import seabornJUPYTER MAGICJupyter is through magic%pylab inline
import numpy
import seabornJupyter notebooksnumpy, seaborn, pylabnumpy: work with matrices and arrays in Pythonseaborn: attractive statistical summary graphspylab: numerical operations and visualisation in Python
Calling %pylab inline shows graphics within the notebook itself
numpy provides a function loadtxt() to load tabular data:numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')loadtxt() belongs to numpyfname: an argument expecting the path to a filedelimiter: an argument expecting the character that separates columns... indicate missing rows or columns1 == 1. == 1.0)datatype(data)
print(data.dtype)
print(data.shape)LIVE DEMO
datadata.<attribute> e.g. data.shapeprint('first value in data:', data[0, 0])
print('middle value in data:', data[30, 20])LIVE DEMO
: (colon).print(data[0:4, 0:10])
print(data[5:10, 0:10])LIVE DEMO
Python assumes the first elementPython assumes the end elementQUESTION: What would : on its own indicate?
small = data[:3, 36:]
print('small is:')
print(small)LIVE DEMO
We can take slices of any series, not just arrays.
element = 'oxygen'
print('first three characters:', element[0:3])
first three characters: oxyWhat is the value of element[:4]?
oxyggenoxyenarrays know how to perform operations on their values+, -, *, /, etc. are elementwisedoubledata = data * 2.0
print('original:')
print(data[:3, 36:])
print('doubledata:')
print(doubledata[:3, 36:])
tripledata = doubledata + data
print('tripledata:')
print(tripledata[:3, 36:])LIVE DEMO
numpy functionsnumpy provides functions to operate on arraysprint(numpy.mean(data))
maxval, minval, stdval = numpy.max(data), numpy.min(data), numpy.std(data)
print('maximum inflammation:', maxval)
print('minimum inflammation:', minval)
print('standard deviation:', stdval)
maxval, minval, stdval = data.max(), data.min(), data.std()
print('maximum inflammation:', maxval)
print('minimum inflammation:', minval)
print('standard deviation:', stdval)LIVE DEMO
patient_0 = data[0, :] # Row zero only, all columns
print('maximum inflammation for patient 0:', patient_0.max())
print('maximum inflammation for patient 0:', numpy.max(data[0, :]))
print('maximum inflammation for patient 2:', numpy.max(data[2, :]))LIVE DEMO
numpy operations on axesnumpy functions take an axis= parameter: 0 (columns) or 1 (rows)print(numpy.max(data, axis=1))
print(data.mean(axis=0))LIVE DEMO
Here’s one I prepared earlier (for the Software Sustainability Institute):
matplotlibmatplotlib is the de facto standard plotting library in Pythonimported seaborn earlier, which makes matplotlib output nicer%pylab inline earlier, which puts matplotlib output in the notebookimport matplotlib.pyplot
image = matplotlib.pyplot.imshow(data)LIVE DEMO
matplotlib .imshow().imshow() renders matrix values as an imagematplotlib .plot().plot() renders a line graphave_inflammation = numpy.mean(data, axis=0)
ave_plot = matplotlib.pyplot.plot(ave_inflammation)LIVE DEMO
.mean() looks artificialmax_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0))
min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0))LIVE DEMO
Can you create a plot showing the standard deviation (numpy.std()) of the inflammation data for each day across all patients?
fig = matplotlib.pyplot.figure()ax = fig.add_subplot()ax.set_ylabel()ax.plot()LIVE DEMO
Can you modify the last plot to display the three graphs on top of one another, instead of side by side?
for loops